165 research outputs found
Low-distortion Subspace Embeddings in Input-sparsity Time and Applications to Robust Linear Regression
Low-distortion embeddings are critical building blocks for developing random
sampling and random projection algorithms for linear algebra problems. We show
that, given a matrix with and a , with a constant probability, we can construct a low-distortion embedding
matrix \Pi \in \R^{O(\poly(d)) \times n} that embeds \A_p, the
subspace spanned by 's columns, into (\R^{O(\poly(d))}, \| \cdot \|_p);
the distortion of our embeddings is only O(\poly(d)), and we can compute in O(\nnz(A)) time, i.e., input-sparsity time. Our result generalizes the
input-sparsity time subspace embedding by Clarkson and Woodruff
[STOC'13]; and for completeness, we present a simpler and improved analysis of
their construction for . These input-sparsity time embeddings
are optimal, up to constants, in terms of their running time; and the improved
running time propagates to applications such as -distortion
subspace embedding and relative-error regression. For
, we show that a -approximate solution to the
regression problem specified by the matrix and a vector can be
computed in O(\nnz(A) + d^3 \log(d/\epsilon) /\epsilon^2) time; and for
, via a subspace-preserving sampling procedure, we show that a -distortion embedding of \A_p into \R^{O(\poly(d))} can be
computed in O(\nnz(A) \cdot \log n) time, and we also show that a
-approximate solution to the regression problem can be computed in O(\nnz(A) \cdot \log n + \poly(d)
\log(1/\epsilon)/\epsilon^2) time. Moreover, we can improve the embedding
dimension or equivalently the sample size to without increasing the complexity.Comment: 22 page
Phenome wide association study of vitamin D genetic variants in the UK Biobank cohort
Introduction
Vitamin D status is an important public health issue due to the high prevalence of
vitamin D insufficiency and deficiency, especially in high latitude areas. Furthermore,
it has been reported to be associated with a number of diseases. In a previous umbrella
review of meta-analyses of randomized clinical trials (RCTs) and of observational
studies, it was found that plasma/ serum 25-hydroxyvitamin D (25(OH)D) or
supplemental vitamin D has been linked to more than 130 unique health outcomes.
However, the majority of the studies yielded conflicting results and no association was
convincing.
Aim and Objectives
The aim of my PhD was to comprehensively explore the association between vitamin
D and multiple outcomes. The specific objectives were to: 1) update the umbrella
review of meta-analysis of observational studies or randomized controlled trials on
associations between vitamin D and health outcomes published between 2014 and
2018; 2) conduct a systematic literature review of previous Mendelian Randomization
studies on causal associations between vitamin D and all outcomes; 3) conduct a
systematic literature review of published phenome wide association studies,
summarizing the methods, results and predictors; 4) create a polygenic risk score of
vitamin D related genetic variants, weighted by their effect estimates from the most
recent genome wide association study; 5) encode phenotype groups based on
electronic medical records of participants; 6) study the associations between vitamin
D related SNPs and the whole spectrum of health outcomes, defined by electronic
medical records utilising the UK Biobank study; 7) explore the causal effect of 25-
hydroxyvitamin D level on health outcomes by applying novel instrumental variable
methods.
Methods
First I updated the vitamin D umbrella review published in 2015, by summarizing the
evidence from meta-analyses of observational studies and meta-analyses of RCTs
published between 2014 and 2018. I also performed a systematic literature review of
all previous Mendelian Randomizations studies on the effect of vitamin D on all health
outcomes, as well as a systematic review of all published PheWAS studies and the
methodology they applied. Then I conducted original data analysis in a large
prospective population-based cohort, the UK Biobank, which includes more than
500,000 participants. A 25(OH)D genetic risk score (weighted sum score of 6 serum
25(OH)D-related SNPs: rs3755967, rs12785878, rs10741657, rs17216707,
rs10745742 and rs8018720, as identified by the largest genome wide association study
of 25(OH)D levels) was constructed to be used as the instrumental variable. I used a
phenotyping algorithm to code the electronic medical records (EMR) of UK Biobank
participants into 1853 distinct disease categories and I then ran the PheWAS analysis
to test the associations between the 25(OH)D genetic risk score and 950 disease
outcome groups (i.e. outcomes with more than 200 cases). For phenotypes found to
show a statistically significant association with 25(OH)D levels in the PheWAS or
phenotypes which were found to be convincing or highly suggestive in previous
studies, I developed an extended case definition by incorporating self-reported data
collected by UK Biobank baseline questionnaire and interview. The possible causal
effect of vitamin D on those outcomes was then explored by the MR two-stage method,
inverse variance weighted MR and Egger’s regression, followed by sensitivity
analyses.
Results
In the updated systematic literature review of meta-analyses of observational studies
or RCTs, only studies on new outcomes which had not been covered by the previous
umbrella review were included. A total of 95 meta-analyses met the inclusion criteria.
Among the included studies there were 66 meta-analyses of observational studies, and
29 meta-analyses of RCTs. Eighty-five new outcomes were explored by meta-analyses
of observational studies, and 59 new outcomes were covered by meta-analyses of
RCTs.
In the systematic review of published Mendelian Randomization studies on vitamin D,
a total of 29 studies were included. A causal role of 25(OH)D level was supported by
MR analysis for the following outcomes: type 2 diabetes, total adiponectin, diastolic
blood pressure, risk of hypertension, multiple sclerosis, Alzheimer’s disease, all-cause
mortality, cancer mortality, mortality excluding cancer and cardiovascular events,
ovarian cancer, HDL-cholesterol, triglycerides and cognitive functions.
For the systematic literature review of published PheWAS studies and their
methodology, a total of 45 studies were included. The processes for implementing a
PheWAS study include the following steps: sample selection, predictor selection,
phenotyping, statistical analysis and result interpretation. One of the main challenges
is the definitions of the phenotypes (i.e., the method of binning participants into
different phenotype groups). In the phenotyping step, an ICD curated phenotyping was
widely used by previous PheWAS, which I also used in my own analysis.
By applying the ICD curated phenotyping, 1853 phenotype groups were defined in the
participants I used. In PheWAS, only phenotype groups with more than 200 cases were
analysed (920 phenotypes). In the PheWAS, only associations between rs17216707
(CYP24A1) and “calculus of ureter” (beta = -0.219, se = 0.045, P = 1.14*10-6), “urinary
calculus” (beta = -0.129, se = 0.027, P = 1.31*10-6), “alveolar and parietoalveolar
pneumonopathy” (beta = 0.418, se = 0.101, P = 3.53*10-5) survived Bonferroni
correction.
Nine outcomes, including systolic blood pressure, diastolic blood pressure, body mass
index, risk of hypertension, type 2 diabetes, ischemic heart disease, depression, non-vertebral
fracture and all-cause mortality were explored in MR analyses. The MR
analysis had more than 80% power for detecting a true odds ratio of 1.2 or larger for
binary outcomes. None of explored outcomes were statistically significant. Results
from multiple MR methods and sensitivity analyses were consistent.
Discussion
Vitamin D and its association with multiple outcomes has been widely studied. More
than 230 outcomes have been linked with vitamin D by meta-analyses of observational
studies and RCTs. On the contrary, evidence from Mendelian Randomization studies
is lacking. In particular I identified only 20 existing MR studies and only 13 outcomes
were suggested to be causally related to vitamin D. In the systematic literature review
of previous PheWAS studies, I summarized the applied methods, predictors and results.
Although phenotyping based on ICD codes provided good performance and was
widely applied by previous PheWAS studies, phenotyping can be improved if lab data,
imaging data and medical notes can be incorporated. Alternative algorithms, which
takes advantage of deep learning and thus enable high precision phenotyping, needs to
be developed.
From the PheWAS analysis, the score of vitamin D related genetic variants was not
statistically significantly associated with any of the 920 phenotypes tested. In the
single variant analysis, only rs17216707 (CYP24A1) was shown to be associated with
calculus outcomes statistically significantly. Previous studies reported associations
between vitamin D and hypercalcemia, hypercalciuria, nephrolithiasis and
nephrocalcinosis, may be due to the role of vitamin D in calcium homeostasis.
In the MR analysis, I found no evidence of large to moderate (OR>1.2) causal
associations of vitamin D on a very wide range of health outcomes. These included
SBP, DBP, hypertension, T2D, IHD, BMI, depression, non-vertebral fracture and allcause
mortality which have previously been proposed to be influenced by low vitamin
D levels. Further, even larger studies, probably involving the joint analysis of data
from several large biobanks with future IVs that explain a higher proportion of the trait
variance, will be required to exclude smaller causal effects which could have public
health importance because of the high population prevalence of low vitamin D levels
in some populations
Numerical Study on Reasonable Entry Layout of Lower Seam in Multi-seam Mining
Abstract: According to the geological conditions of 6# coal seam and 8# coal seam in Xieqiao Coal Mine, reasonable entry layout of lower seam in multi-seam mining has been studied by FLAC3D numerical simulation. Three ways of entry layout including alternate internal entry layout, alternate exterior entry layout and overlapping entry layout has been put forward for discussing on reasonable entry layout. Then stress distribution and displacement characteristics of surrounding rock have been analyzed in the three ways of entry layout by numerical simulation, leading to the conclusion that alternate internal entry layout pattern, which make the entry located in stress reduce zone and avoid the influence of abutment pressure of upper coal seam mining to a certain extent, is a better choice for multi-seam mining. The research results herein can offer beneficial reference for entry layout with similar geological conditions in multi-seam minin
Natural convection heat transfer of a straight-fin heat sink
The influence of mounting angle on heat dissipation performance of a heat sink under natural convection condition is investigated in this paper by numerical simulation and experimental test. It is found that the heat sink achieves the highest cooling power when its mounting angle is 90°, while it reaches the lowest when the mounting angle is 15°, which is 6.88% lower than that of 90°. A heat transfer stagnation zone is the main factor that affects the cooling power of the heat sink, and its location and area vary with the mounting angle. It is identified that cutting the heat transfer stagnation zone is an effective way to improve the heat sink performance
A Distributed Graph Approach for Pre-processing Linked RDF Data Using Supercomputers
Efficient RDF, graph based queries are becoming more pertinent based on the increased interest in data analytics and its intersection with large, unstructured but connected data. Many commercial systems have adopted distributed RDF graph systems in order to handle increasing dataset sizes and complex queries. This paper introduces a distribute graph approach to pre-processing linked data. Instead of traversing the memory graph, our system indexes pre-processed join elements that are organized in a graph structure. We analyze the Dbpedia data-set (derived from the Wikipedia corpus) and compare our access method to the graph traversal access approach which we also devise. Results show from our experiments that the distributed, pre-processed graph approach to accessing linked data is faster than the traversal approach over a specific range of linked queries
- …